-
Notifications
You must be signed in to change notification settings - Fork 138
Ensure EPP flags are configurable via Helm chart #1302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rahulgurnani The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @rahulgurnani! |
Hi @rahulgurnani. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
- "--destination-endpoint-hint-key={{ .Values.inferenceExtension.destinationEndpointHintKey }}" | ||
- "--destination-endpoint-hint-metadata-namespace={{ .Values.inferenceExtension.destinationEndpointHintMetadataNamespace }}" | ||
- "--fairness-id-header-key={{ .Values.inferenceExtension.fairnessIDHeaderKey }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these three flags should be removed (see PR #1296).
destinationEndpointHintMetadataNamespace: "envoy.lb" | ||
destinationEndpointHintKey: "x-gateway-destination-endpoint" | ||
fairnessIDHeaderKey: "x-gateway-inference-fairness-id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
/ok-to-test |
extraContainerPorts: [] | ||
# Define additional service ports | ||
extraServicePorts: [] | ||
|
||
inferencePool: | ||
targetPortNumber: 8000 | ||
modelServerType: vllm # vllm, triton-tensorrt-llm | ||
# modelServers: # REQUIRED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this change please, we should not default this, it should be explicitly set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this comment addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is addressed, now it is commented out as it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rahulgurnani but please add back the # REQUIRED
comment
@rahulgurnani can you please update the configuration options in the chart readme? |
a7ac10b
to
3c28905
Compare
3c28905
to
2e57ea2
Compare
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. | | ||
| `inferenceExtension.logVerbosity` | Logging verbosity level for the endpoint picker. Defaults to `"3"`. | | ||
| `inferenceExtension.enablePprof` | Enables pprof for profiling and debugging | | ||
| `inferenceExtension.modelServerMetricsPath` | Flag to have model server metrics | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These descriptions (modelServerMetricsScheme/Path/Port) are a little vague, could we add more detail?
| `inferenceExtension.modelServerMetricsPort` | Flag for have model server metrics port | | ||
| `inferenceExtension.modelServerMetricsHttpsInsecureSkipVerify` | When using 'https' scheme for 'model-server-metrics-scheme', configure 'InsecureSkipVerify' (default to true) | | ||
| `inferenceExtension.secureServing` | Enables secure serving. Defaults to true. | | ||
| `inferenceExtension.healthChecking` | Enables health checking | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify what the default is.
| `inferenceExtension.certPath` | The path to the certificate for secure serving. The certificate and private key files are assumed to be named tls.crt and tls.key, respectively. If not set, and secureServing is enabled, then a self-signed certificate is used. | | ||
| `inferenceExtension.refreshMetricsInterval` | Interval to refresh metrics | | ||
| `inferenceExtension.refreshPrometheusMetricsInterval` | Interval to flush prometheus metrics | | ||
| `inferenceExtension.metricsStalenessThreshold` | Duration after which metrics are considered stale. This is used to determine if a pod's metrics are fresh enough. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This is used to determine if a pod's metrics are fresh enough." Consider rewording to something like: 'metrics staleness above the configured threshold will be considered invalid'
| `inferenceExtension.refreshPrometheusMetricsInterval` | Interval to flush prometheus metrics | | ||
| `inferenceExtension.metricsStalenessThreshold` | Duration after which metrics are considered stale. This is used to determine if a pod's metrics are fresh enough. | | ||
| `inferenceExtension.totalQueuedRequestsMetric` | Prometheus metric for the number of queued requests. | | ||
| `inferenceExtension.extraContainerPorts` | List of additional container ports to expose. Defaults to `[]`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets clarify that these extra ports are for the EPP itself
| `inferenceExtension.enablePprof` | Enables pprof for profiling and debugging | | ||
| `inferenceExtension.modelServerMetricsPath` | Flag to have model server metrics | | ||
| `inferenceExtension.modelServerMetricsScheme` | Flag to have model server metrics scheme | | ||
| `inferenceExtension.modelServerMetricsPort` | Flag for have model server metrics port | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its worth specifying that if this port is unset, that it defaults to the target port specified on the InferencePool
extraContainerPorts: [] | ||
# Define additional service ports | ||
extraServicePorts: [] | ||
|
||
inferencePool: | ||
targetPortNumber: 8000 | ||
modelServerType: vllm # vllm, triton-tensorrt-llm | ||
# modelServers: # REQUIRED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this comment addressed?
Can we make the flags generic similar to the env vars? This way we don't have to keep updating the chart if the EPP flags change. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Co-authored-by: Kellen Swain <[email protected]>
Yes.. This would mean, adding the flags to be part of the env flag in values.yaml. Most of the flags are already read in runner.go today: I think we can make this change and it resonates with me. What do others think about it? @kfswain @ahg-g @nirrozenbaum ? |
366c1da
to
b6dcd34
Compare
Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages. The list of commits with invalid commit messages: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@rahulgurnani: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Just to clarify, I was just asking if it's possible to make the flags also generic in helm like the env vars(I never tried but I hope so). I didn't mean to convert all flags to env vars (I don't think we should...) in order to do that. |
Fixes issue #1207 to make all flags in EPP configurable via Helm charts